Hierarchical multilayer perceptron based language identification
نویسندگان
چکیده
Automatic language identification (LID) systems generally exploit acoustic knowledge, possibly enriched by explicit language specific phonotactic or lexical constraints. This paper investigates a new LID approach based on hierarchical multilayer perceptron (MLP) classifiers, where the first layer is a “universal phoneme set MLP classifier”. The resulting (multilingual) phoneme posterior sequence is fed into a second MLP taking a larger temporal context into account. The second MLP can learn/exploit implicitly different types of patterns/information such as confusion between phonemes and/or phonotactics for LID. We investigate the viability of the proposed approach by comparing it against two standard approaches which use phonotactic and lexical constraints with the universal phoneme set MLP classifier as emission probability estimator. On SpeechDat(II) datasets of five European languages, the proposed approach yields significantly better performance compared to the two standard approaches.
منابع مشابه
Use of recurrent network for unknown language rejection in language identification system
In the past, we attempted to use a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidentified as one of the target languages in language identification system. However, the use of multilayer perceptron neural network could not utilize the temporal information from the utterances. Results show that with the use of phonemic unigram as input f...
متن کاملMultilayer Perceptron Based Hierarchical Acoustic Modeling for Automatic Speech Recognition
متن کامل
UTA DLNLP at SemEval-2016 Task 12: Deep Learning Based Natural Language Processing System for Clinical Information Identification from Clinical Notes and Pathology Reports
We propose a deep neural network based natural language processing system for clinical information (such as time information, event spans, and their attributes) extraction from raw clinical notes and pathology reports. Our approach uses the context words and their partof-speech tags and shape information as features. We utilize the temporal (1D) convolution neural network to learn the hidden fe...
متن کاملUnknown language rejection in language identification system
The number of languages in the world is much larger than the number of target languages that current language identication systems can handle. Therefore, we propose here the use of a multilayer perceptron neural network as a means to prevent those unknown language inputs from being misidenti ed as one of the target languages. We consider not only the target language identi cation rate but also ...
متن کاملClassifier Stacking for Native Language Identification
This paper reports our contribution (team WLZ) to the NLI Shared Task 2017 (essay track). We first extract lexical and syntactic features from the essays, perform feature weighting and selection, and train linear support vector machine (SVM) classifiers each on an individual feature type. The output of base classifiers, as probabilities for each class, are then fed into a multilayer perceptron ...
متن کامل